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Abstract The frequencies Xi, X2, ... of an exchangeable Gibbs random partition U of N = {1, 2, . . .} 
(Gnedin and Pitman (2006)) are considered in their age-order, i.e. their size-biased order. We study their 
dependence on the sequence 11,12, . . . of least elements of the blocks of 77. In particular, conditioning on 
1 = %\ < i2 < ■ ■ ., a representation is shown to be 

00 
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where {£j : j = 1, 2, . . .} is a sequence of independent Beta random variables. Sequences with such a 
product form are called neutral to the left. We show that the property of conditional left-neutrality in 
fact characterizes the Gibbs family among all exchangeable partitions, and leads to further interesting 
results on: (i) the conditional Mellin transform of X^, given i^, and (ii) the conditional distribution 
of the first k normalized frequencies, given }_^j =1 Xj and ik', the latter turns out to be a mixture of 
Dirichlet distributions. Many of the mentioned representations are extensions of Griffiths and Lessard 
(2005) results on Ewens' partitions. 
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1 Introduction. 



A random partition of [n] = {1, . . . , n} is a random collection 77„ — {77„i, . . . , H n k\ of disjoint nonempty 
subsets of [n] whose union is [n]. The classes of 77„ are conventionally ordered by their least elements 
1 = %x < 12 < ■ ■ ■ < ik < n. We call {ij} the sequence of record indices of 77„, and define the age-ordered 
frequencies of 77„ to be the vector n = (nx, . . . , n&) such that rij is the cardinality of H n j- Consistent 
Markov partitions 77 = (77i, 772, . . .) can be generated by a set of predictive distributions specifying, for 
each n, how 7T„+i is likely to extend 77„, that is: given iJ n , a conditional probability is assigned for the 
integer (n + 1) to join any particular class of 77„ or to start a new class. 

We consider a family of consistent random partitions studied by Gnedin and Pitman [13j which can 
be defined by the following prediction rule: (i) set H\ = ({1}); (ii) for each n > 1, conditional on 
n„ = (TTni, ■ ■ ■ j TTnfc), the probability that (n + 1) starts a new class is 

V n +l,k+l 



k 



otherwise, if rij is the cardinality of Tr n j (j — 1, . . . , k), the probability that (n + 1) falls in the j-th 



"old" class ir n j is 

n- ak \ V n ,k J 

for some a G (—00, 1] and a sequence of coefficients V — (V ni k : k < n = 1, 2, . . .) satisfying the recursion: 
^i,i = l; K,fe = - atk)V n+ i t k + K+i,fc+i- (3) 

Every partition 77 of N so generated is called an exchangeable Gibbs partition with parameters (a, V") 
(EGP(a, V)), where exchangeable means that, for every n, the distribution of 77„ is a symmetric function 
of the vector n = (711, . . . , nf.) of its frequencies ([H]) (see section [2] below) . Actually, the whole family 
of EGPs, treated in [TJ] includes also the value a = —00, for which the definition C}-© should be 
modified; this case will not be treated in the present paper. 

A special subfamily of EGPs is Pitman's two-parameter family, for which V is given by 

v(a , e) = n; = i(g+°Q--i)) U] 

n.k n \ ) 

f(n) 

where either a G [0, 1] and 9 > —a or a < and 8 = m\a\ for some integer m. Here and in the following 
sections, a^ x ) will denote the generalized increasing factorial i.e. a^ x ) = r(a + x)/r(a), where r(-) is the 
Gamma function. 

Pitman's family is characterized as the unique class of EGPs with TZ-coefficients of the form 

v k 

Vn.fe = 

Cn 

for some sequence of constants (c„) (p3j, Corollary 4). If we let a = in (j4]), we obtain the well known 
Ewens ' partition for which: 



<T = f- (5) 
f(n) 

Ewens' family arose in the context of Population Genetics to describe the properties of a population of 
genes under the so-called infinitcly-many-allclcs model with parent-independent mutation (see e.g. [31j . 
[20j ) and became a paradigm for the modern developments of a theory of exchangeable random partitions 

For every fixed a, the set of all EGP(a, V) forms a convex set; Gnedin and Pitman proved it and gave 
a complete description of the extreme points ([13], Theorem 12). It turns out, in particular, that for 
every a < 0, the extreme set is given by Pitman's two-parameter family. For each a G (0, 1), the extreme 
points are all partitions of the so-called Poisson-Kingman type with parameters (a, s), s > 0, whose 
V^-cocfficicnts are given by: 

V n , k {s) = a k s n ' a G a {n - ak, s" 1 ^), (6) 
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with 



1 



Gaiq > t):= jmmJo fa{t - v)v9 ~ ldv > (7) 

and f a is an a-stable density ([28], Theorem 4.5). The partition induced by ((6]) has limit frequencies 
(ranked in a decreasing order) equal in distribution to the jump-sizes of the process (St/ Si : t € [0, 1]), 
conditioned on Si — s, where (St ■ t > 0) is a stable subordinator with density f a . The parameter s has 
the interpretation as the (a.s.) limit of the ratio K n /n a as n — > oo, where K n is the number of classes 
in the partition 7T„ generated via V nj k(s). K n is shown in [13] to play a central role in determining the 
extreme set of V n k for every a; the distribution of K n , for every n, turns out to be of the form 



\K n = k) = V n , k 



(8) 



where [£] q are generalized Stirling numbers, defined as the coefficients of x n in 



it k k\ 



" ! (i-(i-xr) fc 



(see |13) and reference therein). As n — > oo, lf n behaves differently for different choices of the parameter 
a: almost surely it will be finite for a < 0, K n ~ S'logn for a = and if n ~ SVi" for positive a, for 
some positive random variable S. 



In this paper we want to study how the distribution of the limit age-ordered frequencies Xj — linin^oo nj/n 
(j = 1,2,...) in an Exchangeable Gibbs partition depends on its record indices i = (1 = i\ < 12 < ■ ■ •)■ 
To this purpose, we adopt a combinatorial approach proposed by Griffiths and Lessard [2] to study 
the distribution of the age-ordered allele frequencies Xi , X2 , . . . in a population corresponding to the 
so-called Coalescent process with mutation (see e.g. [31]). whose equilibrium distribution is given by 
Ewens' partition ([5]), for some mutation parameter 9 > 0. In such a context, the record index ij has the 
interpretation as the number of ancestral lineages surviving back in the past, just before the last gene of 
the j-th oldest type, observed in the current generation, is lost by mutation. 

Following Griffiths and Lessard's steps we will (i) find, for every n, the distribution of the age ordered 
frequencies n = (m, . . . , rife), conditional on the record indices i n = (1 = i\ < ii < . . . < if.) of 7T n , as well 
as the distribution of i n ; (ii) take their limits as n — ► 00; (iii) for m — 1, 2, . . . , describe the distribution 
of the m-th age-ordered frequency conditional on i m alone. We will follow such steps, respectively, in 
sections 13.11 13.21 01 In addition, we will derive in section [5] a representation for the distribution of the 
first k age-ordered frequencies, conditional on their cumulative sum and on i^. 

In our investigation of EGPs, the key result is relative to the step (ii), stated in Proposition [3] where we 
find that, conditional on i = (1, 12, ■ ■ .), for every j = 1,2, . . ., 

00 

x J \Uz j - 1 J[p.-z i ), (9) 

almost surely, for an independent sequence (£o>£ii ■••) 6 [0, 1] 00 such that £q = 1 and £ m has a Beta 
density with parameters (1 — a, i m +i — otm — 1) for each m > 1. The representation §§§ does not depend 
on V. The parameter V affects only the distribution of the record indices i = (ii,i2, ■■ •) which is a 
non-homogenous Markov chain, starting at ii = 1, with transition probabilities 

Pjiij+iVj) = - a 3){i j+ i -ij-l) 'v. 1 '^ 1 ' - L ( 10 ) 

The representation (|5j)- (fTU)) extends Griffiths and Lessard's result on Ewens' partitions ([2], (29)), 
recovered just by letting a — 0. 

In section [331 we stress the connection between the representation ([9]) and a wide class of random discrete 
distributions, known in the literature of Bayesian Nonparametric Statistics as Neutral to the Left (NTL) 
processes ([IS], [5]) and use such a connection to show that the structure (0 with independent 
actually characterizes EGP's among all exchangeable partitions of N. 

The representation ^ is useful to find the moments of both Xj and J2l=i conditional on the j-th 
record index ij alone, as shown in section [4] In the same section a recursive formula is found for the 
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Mcllin transform of both random quantities, in terms of the Mellin transform of the size-biased pick X\ . 
In section 14.11 an alternative description of the conditional distribution of — log Xj given ij leads to a 
characterization of Ewens-Kingman's partitions as the only class for which — log Aj can be expressed as 
an infinite sum of independent random variables. 

Finally in section [5] we obtain an expression for the density of the first k age-ordered frequencies 
Xi, . . . , Xk, conditional on Xa=i-^Q and ik, as a mixture of Dirichlet distributions on the [k — 1)- 
dimensional simplex (k = 1, 2, . . .). Such a result leads to a self-contained proof for the marginal distri- 
bution of ik, whose formula is closely related to Gnedin and Pitman's result ([8]). 

As a completion to our results, it should be noticed that a representation of the unconditional distribution 
of the age-ordered frequencies of an EGP can be derived as a mixture of the age-ordered distributions of 
their extreme points, which are known: for a < 0, the extreme age-ordered distribution is the celebrated 
two-parameter GEM distribution ([24], [25]), for which: 

j'-i 

X j =B j ]J(l-B j ), j = 1,2,... (11) 

i=l 

for a sequence (Bj : j = 1,2, . . .) of independent Beta random variables with parameters, respectively 
{(1 — a, 9 + ja) : j = 1, 2, . . .}. Such a representation reflects a property of right-neutrality, which in a 
sense is the inverse of (0), as it will be clear in section 13721 When a is strictly positive, the structure of 
the age-ordered frequencies in the extreme points lose such a simple structure. A description is available 
in [23]. 

We want to embed Griffiths and Lessard's method in the general setting of Pitman's theory of exchange- 
able and partially exchangeable random partitions, for which our main reference is [24] . Pitman's theory 
will be summarized in section [2] • The key role played by record indices in the study of random partitions 
has been emphasized by several authors, among which Kerov [18] . Kerov and Tsilevich [19], and more 
recently by Gnedin [12], and Nacu [22], who showed that the law of a partially exchangeable random 
partition is completely determined by that of its record indices. We are indebted to an anonymous referee 
for signalling the last two references, whose findings have an intrinsic connection with many formulae in 
our section [2] 



2 Exchangeable and partially exchangeable random partitions. 

We complete the introductory part with a short review of Pitman's theory of exchangeable and partially 
exchangeable random partitions, and stress the connection with the distribution of their record indices. 
For more details we refer the reader to [24] and [28] and reference therein. Let /i be a distribution on 
A = {x — (x\,X2, . . .) G [0, 1]°° : \x\ < 1}, endowed with a Borel sigma-field. Consider the function: 

q^ni, . . .,n h ) = jf j - J>) ^ ^ 

The function q^ is called the Partially exchangeable probability function (PEPF) of fj,, and has the 
interpretation as the probability distribution of a random partition 27„ = (77„i, . . . , II n k), for which a 
sufficient statistic is given by its age-ordered frequencies (m, ...,nk), that is: 

FV(i7n = (fni) ■ ■ ■ ,7T„ fe )) = g M (ni, ...,n k ) 

for every partition (7T„i, . . . , 7r„fe) such that \ir n j | = n j (j = 1, . . . , k < n). 

If q^{n\, . . . , rife) is symmetric with respect to permutations of its arguments, it is called an Exchange- 
able partition probability function (EPPF), and the corresponding partition 77„ an exchangeable random 
partition. 

For exchangeable Gibbs partitions, the EPPF is, for a e (— oo, 1], 

k 

q a ,v(n>i, ...,nk) = V n ,k XT (1 - a)(n 3 -i), (13) 

3=1 



4 



with (Vn y k) defined as in ©. This can be obtained by repeated application of ©-©-©. 
A minimal sufficient statistic for an exchangeable 77„ is given, because of the symmetry of its EPPF, by 
its unordered frequencies (i.e. the count of how many frequencies in 77„ are equal to 1, to n), whose 
distribution is given by their (unordered) sampling formula: 

^ = (n)niW^ (n) ' (14) 

where 



k 



and bi is the number of rij 's in n equal to i (i = 1, . . . , n). 

It is easy to see that for a Ewens' partition (whose EPPF is (TT5)) with a — and V given by ©), formula 
(|14[) returns the celebrated Ewens ' sampling formula. 
The distribution of the age-ordered frequencies 

P(n)=0 1 f)a(n)q ti (ii), (15) 



n 



differs from (TT4")) only by a counting factor, where 



<*)=ii — ^h— as) 



is the distribution of the size-biased permutation of n. 

If 77 = (77„) is a (partially) exchangeable partition with PEPF q^, then the vector n~ 1 (ni,n2, ■ ■ .) of 
the relative frequencies, in age-order, of 77„, converges a.s. to a random point P = (Pi, P2, . . .) £ A with 
distribution /_t: thus the integrand in (Q© has the interpretation as the conditional PEPF of a partially 
exchangeable random partition, given its limit age-ordered frequencies (pi,p2, . . .). If q^ is an EPPF, 
then the measure dfi is invariant under size-biased permutation. 




The notion of PEPF gives a generalized version of Hoppe's urn scheme, i.e. a predictive distribution for 
(the age-ordered frequencies of) 77„ + i, given (those of) 7T„. In an urn of Hoppe's type there are colored 
balls and a black ball. Every time we draw a black ball, we return it in the urn with a new ball of a new 
distinct color. Otherwise, we add in the urn a ball of the same color as the ball just drawn. 
Pitman's extended urn scheme works as follows. Let q be a PEPF, and assume that initially in the urn 
there is only the black ball. Label with j the j-th distinct color appearing in the sample. After n > 1 
samples, suppose we have put in the urn n — (n\ , . . . , nk) balls of colors 1, . . . , fc, respectively, with colors 
labeled by their order of appearance. The probability that the next ball is of color j is 

P(n + e» = g( ° ( y } I(j < fc) + ( 1 - > ; » V ~,:" J/ I I(j - fc + 1), (17) 

where = (5ij : i = 1, . . . , k) and S xy is the Kronecker delta. The event (j = fc + 1), in the last term of 

the right-hand side of (fl7|) . corresponds to a new distinct color being added to the urn. 

The predictive distribution of a Gibbs partition is obtained from its EPPF by substituting (fT3| into (fT7|) : 

P(n + e» = ^ ( 1 - Yll±hh±l) I(J - < fc ) + Yll±hh±l l{] =k + 1)j (18) 
71 - ak \ V n> k J V n ,k 

which gives back our definition (JU-JU °f an EGP. 

The use of an urn scheme of the form (JU _ © m Population Genetics is due to Hoppe [TS] in the context of 
Ewens' partitions (infinitely-many-alleles model), for which the connection between order of appearance 
in a sample and age-order of alleles is shown by [S]. In [TU] an extended version of Hoppe's approach 
is suggested for more complicated, still exchangeable population models (where e.g. mutation can be 
recurrent). Outside Population Genetics, the use of ©-© for generating trees leading to Pitman's 
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two-parameter GEM frequencies, can be found in the literature of random recursive trees (see e.g. [7]). 
Urn schemes of the form (jXTJ) are a most natural tool to express one's a priori opinions in a Bayesian 
Statistical context, as pointed out by [32] and ES]- Examples of recent applications of Exchangeable 
Gibbs partitions in Bayesian nonparametric Statistics are in [5T], [16]. The connection between (not 
necessarily infinite) Gibbs partitions and coagulation-fragmentation processes is explored by [2] (see also 
2 and reference therein). 



2.1 Distribution of record indices in partially exchangeable partitions. 

Let II be a partially exchangeable random partition. Since, for every n, its age-ordered frequencies 
n = (m, . . . , nk) are a sufficient statistic for 7T„, then all realizations 7r„ with the same n and the same 
record indices i n = 1 < it < . . . < ik < n must have equal probability. To evaluate the joint probability 
of the pair (n, i n ), we only need to replace a(n) in (I15|) by an appropriate counting factor. This is equal to 
the number of arrangements of n balls, labelled from 1 to n, in k boxes with the constraint that exactly 
rij balls fall in the same box as the ball ij. Such a number was shown by [Mj to be equal to 



a(n,i„ 



where 

/ Sj —i 



and a(n, i n ) 



n 



m ;».'■ c:) 



with Sj '■= y^l—i n-i. Thus, if 77 = (i7„) is a partially exchangeable random partition with PEPF q n , 
then the joint probability of age-ordered frequencies and record indices is 

/2(n,i„)= ^'"'^a(n,i n )g M (n). (19) 

The distribution of the record indices can be easily derived by marginalizing: 



E ('"'Hn.vMn). (20) 

n£B(i„) 



where 

k 

B n (i n ) = {(m, ■ ■ ■ , rik) ■■ ^ rij = n; S^! > ij - l,j = 1, . . . , k} 

i=l 

is the set of all possible n compatible with i. In [T3] such a formula is derived for the particular case of 
Ewens' partitions. For general random partitions see also [22j . section 2. 
Notice that, for every n such that |n| = n, 



a ( n ) = E a ( n ' i ») = II 



ineC(n) j=l 71 2^i=l Ul 

where 

C(n) = {(1 < i 2 < . ■ ■ < ik < n) : k < n, ij < Sj-i + 1} 

is the set of all possible i„ compatible with n. Then the marginal distribution of the age-ordered fre- 
quencies (IT5|) is recovered by summing (fP9"|) over C(n). 

This observation incidentally links a classical combinatorial result to partially exchangeable random 
partitions. 



G 



Proposition 1 Let LI = (LT n ) be a partially exchangeable partition with PEPF q^. 

(i) Given the frequencies n = (n%, . . . ,rik) in age-order, the probability that the least elements of the 
classes of ' LI n are i„ = . . . , ik), does not depend on q^ and is given by 



k 

p(i»=in 



-(n-Sj-i). 



(21) 



i = i (Sj^-ij + iy. 

(ii) Let Wj = linin^oo Sj/n. Conditional on {Wj : j = 1,2, . . .}, the waiting times 

Tj=ij-ij-i-l (j = 2,3,...) 

are independent geometric random variables, each with parameter (1 — Wj-i), respectively. 

Proof Part (i) can be obtained by a manipulation of a standard result on uniform random permutations 
of [n). Part two can be proved by using a representation theorem due to Pitman ([21], Theorem 8). We 
prefer to give a direct proof of both parts to make clear their connection. Simply notice that, for every 
n and i, the right-hand side of (f2"Tj) is equal to a(n, i n )/a(n). Then, for every n, 



5>a»= e 



i„£C(n) 



fl(n,i n ) 
o(n) 



and 



E E F dn\n)(i(n) = 1, 

n C(n) 

where p,(n) is as in (p~5|) . hence P(i„|n) is a regular conditional probability and (i) is proved. Now, consider 
the set 

C[ H ,i](n) := {ii < i l+1 < . . . < i k <n : ij < Sj-i + 1, j = I + 1, . . . , k}. 

Also define, for j = 1, k—l+1, n* := Sj — <5j_i with Sj := Sj+i-i — ii+1, andij := — Then 

C\i t n (n) = C(n*) so that, for a fixed I < k, the conditional probability of 12, ■ ■ ■ , %i, given n = (n\, . . . , n&), 
is 



P(*2, 



*»w=^r e n 



(Sj-i,-)! 



(n - i;)! 



E 

C(n') 



7=1 t^'- 1 ~ *J 

«(nVn*) 
o(n*) ' 



1) 



(n - S1-1) 
-*, + !)! 



(22) 



where n* = n. — i/ + 1. The sum in (122|) is 1; multiply and divide the remaining part by [S/ _1 (Sz 
i/)!]/(S/ — 1)!. The probability can therefore be rewritten as 



P(i 2 ,...,ii|n) = 



(5; - l)[jj-l] / 5; 



-(7-1) / 



n('-¥ 



3=1 



(n - V /' 

where ar r i = o(a — 1) • • • (a — r + 1) is the falling factorial. Now, define 

1.2....: lim (Sj/n : j = 1,2,...) 

n — >oc 

then, for I fixed, 



(St-iy.^iSj-t-ij + iy. 



lim P(» 2> . . . ,*,|n) = WT 1 "' T^ 1 ~ W i-i) 



n 

3=2 



II n ; , \1-Wj-1), 
3=2 
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which is the distribution of k — 1 independent geometric random variables, each with parameter (1 — Wj), 
and the proof is complete. 



By combining the definition (fT2j) of PEPF and Proposition [TJ one recovers an identity due to Nacu ([22], 
(7))- 

Corollary 1 ([22], Proposition 5) For every sequence 1 — i\ < ii < . . . < ik + 1 = n, and every point 
x = (xt,x 2 , ■ ■ .) e A, 

3 = 1 B(i) V 7 J=l 

w/iere iOj = Ya=i x i U = !> 2 > ■ ■ ■)■ 

Proof Multiply both sizes by J^J, (1 — Wj-i): by Proposition [1] and (JT2J) , formula (|23p is just the equality 
(|20p with the choice dfi = S x . 



3 Age-ordered frequencies conditional on the record indices in Exchangeable Gibbs 
partitions 

3. 1 Conditional distribution of sample frequencies. 

From now on we will focus only on EGP(a, V). We have seen that the conditional distribution of the 
record indices, given the age-ordered frequencies of a partially exchangeable random partition, is purely 
combinatorial as it does not depend on its PEPF. We will now find the conditional distribution of the 
age-ordered frequencies n given the record indices, i.e. the step (i) of the plan outlined in the introduction. 
We show that such a distribution does not depend on the parameter V, which in fact affects only the 
marginal distribution of i„, as explained in the following Lemma. 

Lemma 1 Let II = (i7„) be an EGP(a, V), for some a G (— oo, 1) and V — (V n .k ■ k < n = 1, 2, . . .). 
For each n, the probability that the record indices in II n are i„ = (ii, . . . ,ik) is 



whe 



Ma,v(in) = V' Q ,L fe (i™)K,fc- (24) 
r(l-a) A r(i 3 ~ja) 

tfW W - r{n _ ak) 11 r{i . _ ja a)) . w 

Then the sequence i±, i 2 , ■ ■ ■ forms a non-homogeneous Markov chain starting at i± = 1 and with transition 
probabilities given by 

Pjiij+iVj) = fa - Q=i)(i J+1 -ij-i) 'y 1 ' J+1 > i ^ L ( 26 ) 

Proof The proof can be carried out by using the urn scheme (|18[) . For every n, let K n be the number of 
distinct colors which appeared before the n+ 1-th ball was picked. From (|18[) . the sequence (K n : n > 1) 
starts from _ftTi = 1 and obeys, for every n, the prediction rule: 

P(K n+1 = k n+1 \K n =k)=(l- ^r 1 ^] I(fc„+i = k) + Vn ^ k+1 I(k n+1 =k + 1). (27) 

\ V n ,k J Vn,k 

By definition, K n jumps at points 1 < i 2 < ■ ■ ., due to the equivalence 

{K n+ i = K n + l\K n = k} = {i k +i =n + 1}. 

Therefore, from (l27l). 
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J J ' \, l,K 

= n ( z - 1 - A ^^ ia ) ■ ( 2g ) 

The last product in (|28p is equal to 

= ^,n, fc (in). (29) 
and this proves (|24|) . The second part of the Lemma (i.e. the transition probabilities ([26]) ) follow imme- 



diately just by replacing, in (|24|) . n with i*., for every k, to show that 

fc-i 

Ma,v(i» fc ) = n Pj(ij+i|ii), 

for P, satisfying (j2l)|) for every j. 

The distribution of the age-ordered frequencies in an EGP 7T„, conditional on the record indices, can be 
easily obtained from Lemma [1] and (1 . 

Proposition 2 Lei U = (II n ) be an EGP(a,V) , /or some a S (— oo, 1) and 1/ = (V^fc : k < n — 
1,2,...). For eac/i n, the conditional distribution of the sample frequencies n in age-order, given the 
vector i„ of indices, is independent of V and is equal to 



Ma(n|i n ) = 1p a , n ,k(in) ( f[ ^ (1 - a )(n.,-l) 



(30) 



Remark 1 Notice that, as a — > 0, formula (f3"0"]) reduces to that for Ewens' partitions, proved in 

k k 



^(Si-i + ij-l)! 
Proof Recall that the probability of a pair (n, i„) is given by 



fJ, a ,v(n, in) = y n j a ( n > in)°a,v( n )- ( 31 ) 
Now it is easy to derive the conditional distribution of a configuration given a sequence i„, as: 

p a y (n, in) 



At Q (n|i n ) 



Ma,v(in) 

n|\ / IlJ=l (»'-/) 0- " «)(%-!) \ V n ,k 



and the proof is complete. 
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3.2 The distribution of the limit frequencies given the record indices. 

We now have all elements to derive a representation for the limit relative frequencies in age-order, 
conditional on the limit sequence of record indices i = (i± < 12 < ■ ■ ■ ) generated by an EGP(a, V). 

Proposition 3 Let LI = (LJ n ) n >i be an Exchangeable Gibbs Partition with index a > —00 for some V . 
Let i = (ii < Z2 < . • .) be its limit sequence of record indices and Xi,X2, ■ ■ ■ be the age-ordered limit 
frequencies as n — + 00. 

A regular conditional distribution of Xi,X2, ■ ■ ■ given the record indices is given by 

00 

Xj=Zj-il[(l-Zm), 3>1, (33) 

m=j 

a.s., where £0 = 1 and, for j > 1, £j is a Beta random variable in [0, 1] with parameters (1 — a, ij+i — 
ja - 1). 



Remark 2 Proposition [3] is a statement about a regular conditional distribution. The question about the 
existence of a limit conditional distribution of X\i as a function of i = lim„ i„ has different answer 
according to the choice of a, as a consequence of the limit behavior of K n , the number of blocks of an 
EGP LL n , as recalled in the introduction. For a < 0, i is almost surely a finite sequence; for nonnegative 
a, the length I of i will be a.s. either I ~ slogn (for a = 0) or I ~ sn a (for a > 0), for some s £ [0, 00]. 
The infinite product representation (|33[) still holds in any case if we adopt the convention i% = 00 for 
every k > K x where := lim n _ >00 K n . 

Proof The form (|3H)) of the conditional density /i Q (n|i n ) implies 

En(^ii)( i - a Wi)=^u i «)- ^ 

n|=nj = l 1 

For some r < k, let 02, . . . , a r be positive integers and set a± = and a r +i = . . . = ojt = 0. Define 
i« = (i'i,---,i' k ) where 

j 

i'j = ij + ^2 a i (i = l> •••,&)■ 

1 

Now take the sum (|34|) with i„ replaced by i' n , and multiply it by ip a ,n,k(in)- We obtain 

1pa,n,k(}n) _ I A ( S 3 ~ Qgj-1 - ij + 1)! 

^«,n,*(i;) " \j = \ (Sj - (Sj-i - i'j + ly. 

where the expectation is taken with respect to /S a (-|i ra ). The left hand side of ([55)) is 

V»q,n,fc(in) _ A fa ~ 3* ~ (1 ~ <*)] (gj^ a,) 
fc-1 

=n E (( i -ei) E - iai ) ) (36) 

3=1 

where £1, . . . , are independent Beta random variables, each with parameters (1 — a, ij+\ — ja — 1). 
Let bj = J2l=i a i- On the right hand side of ([33)1 . Sq = 0, Sk = n, so the product is equal to 

n(^) 

3=1 V J 7 




&i-i / i _ i±i \ 

i=0 V 1 Si-i J 
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Since cij = for j = 1 and j > r, as fc, n — > oo the product inside square brackets converges to 1 so the 
limit of ([37]) is 

where Wj = linin^oo Sj/n. Hence from (|35p it follows that in the limit 

(r— 1 \ oo 

iw =n f? (( i -^) E ^ ioi ) 
3=1 / J=2 

which gives the limit distribution of the cumulative sums: 

oo 

^ = 11(1-6). 3 = 1,2,... 

i=j 

But 

oo oo 

^nc 1 -^)- n (!-^) 

i=j i=i-l 

OO 



and the proof is complete. 



3.3 Conditional Gibbs frequencies, Neutral distributions and invariance under size-biased permutation. 

Proposition^ says that, conditional on all the record indices i\, . . ., the sequence of relative increments 
of an EGP(a,y) 

e=f#,#,-V 08) 



W 2 W 3 

is a sequence of independent coordinates. In fact, such a process can be interpreted as the negative, time- 
reversed version of a so-called Beta-Stacy process, a particular class of random discrete distributions, 
introduced in the context of Bayesian Nonparametric Statistics as a useful tool to make inference for 
right-censored data (see [35], [5U] for a modern account). 

It is possible to show that such an independence property of the £ sequence (conditional on the indices) 
actually characterizes the family of EGP partitions. To make clear such a statement we recall a concept 
of neutrality for random [0, l]-valued sequences, introduced by Connor and Mosimann 0] in 1962 and 
refined in 1974 by Doksum [6] in the context of nonparametric inference and, more recently, by Walker 
and Muliere [50] , 

Definition 1 Let k be any fixed positive integer (non necessarily finite). 

(i) Let P = {Pi, Pa> • • • , Pk) be a random point in [0, l] k such that Y]j—i Pi < 1 and, for every j = 
l,...,k-l denote Fj = Y%=i p i- p is called a Neutral to the Right (NTR) sequence if the vector 
Bi, Bi, . . • , Bk-\ of relative increments 

3 1 - Fj-i J ' ' 

is a sequence of independent random variables in [0,1]. 

Let (a, (3) be a point in [0,oo] fe_1 . A NTR vector such that every increment Bj is a Beta (aj,/3j) 
(j = 1, . . . , k — 1), is called a Beta-Stacy distribution with parameter (a, 0). 
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(ii) A Neutral to the left (NTL) vector P = (P 1 , P 2 , . . . , P k ) is a vector such that P* := (P k , P fe _i, . . . , Pi) 
is NTR. 

A Left-Beta-Stacy distribution with parameter (a, (3) is a NTL vector P such that P* is Beta-Stacy 
(a*, (3*) = ((a k -i,(3k-i), (ou,/u))- 

A known result due to [25] is that the only class of exchangeable random partitions whose limit age- 
ordered frequencies are (unconditionally) a NTR distribution, is Pitman's two-parameter family, i.e. 
the EGP(a, V) with ^-coefficients given by (Q}. In this case, the age-ordered frequencies follow the so- 
called two-parameter GEM distribution, a special case of Beta-Stacy distribution with each Bj being a 
Beta(l — a, 9 + ja) random variable. 

The age-ordered frequencies of all other Gibbs partitions are not NTR; on the other side, Proposition [3] 
shows that, conditional on the record indices i\, i%, . . ., and on W k they are all NTL distributions. For a 
fixed k set 

Then 



Yi = -£r ± , l<j<k. 



1 - F< = 



3 W k 

and 



e*-. = (39) 

By construction, the sequence Yi, . . . , Y k is a Beta-Stacy sequence with parameters a k j = 1 — a and 
/3fc,j = ife-j+i — (fc — j + l)a — (1 — a) (j = 1, . . . , k). The property of (conditional) left-neutrality is 
maintained as k — + oo (just condition on W koo — 1 where = linin^oo K n ). 

The following proposition is a converse of Proposition [3l 

Proposition 4 Let X = (X\, X 2l ■ . ■) G A be the age-ordered frequencies of an infinite exchangeable 
random partition 77 o/N. Assume, conditionally on the record indices of LI , X is a NTL sequence. Then 
LI is an exchangeable Gibbs partition for some parameters (a, V). 

Proof The frequencies of an exchangeable random partition of N are in age-order if and only if their 
distribution is invariant under size-biased permutation (ISBP, see [5], [25 ). To prove the proposition, 
we combine two known results: the first is a characterization of ISBP distributions; the second is a 
characterization of the Dirichlct distribution in terms of NTR processes. We recall such results in two 
lemmas. 

Lemma 2 Invariance under size-biased permutation (|25j. Theorem 4). Let X be a random point of 
[0,1]°° such that \P\ < 1 almost surely with respect to a probability measure dfi. For every k, let \x k 
denote the distribution of X\, . . . ,X k , and G k the measure on [0, l] k , absolutely continuous with respect 
to fj, k with density 

^—^(x x ,...,Xk) = Y[{l~vjj) 

where w 3 = J2i=i x i, 3 = 1> 2 > • • ■ 

X is invariant under size-biased permutation if and only if G k is symmetric with respect to permutations 
of the coordinates in R fc . 

Let X be the frequencies of an exchangeable partition 77, and denote with the marginal law of the 
record indices of 77. Consider the measure G k of Lemma [2] By Proposition [T] (ii), for every k 

fc-l 

G k {dx\ x ■ ■ • x dx k ) = /i k (dxi x ■ • • x dx k ) J^[(l — Wj) 

»=i 

= n k {dx\ x • • • x daifc)P(ii = l,z 2 = 2, . . . ,i k = k \x%, ...,x k ) 
= Hk(dxi x . . . x dxk\ik = fe)P At (ijfc = k) (40) 
An equivalent characterization of ISBP measure is: 
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Corollary 2 The law of X is invariant under size-biased permutation if and only if for every k, there 
is a version of the conditional distribution 



Hk{dxi x ... x dx k \ik = k) 
which is invariant under permutations of coordinates in R fc . 

The other result we recall is about Dirichlet distributions. 

Lemma 3 Dirichlet and neutrality ( 5,, Theorem 7). Let P be a random k-dimensional vector with 
positive components such that their sum equals 1. If P is NTR and P n does not depend on (1 — 
X k )^ 1 (Pi, ■ ■ ■ ,Pk)- Then X has the Dirichlet distribution. 



Now we have all elements to prove Proposition 2J Let %i . ■ .) be the distribution of a NTL vector 

such that the distribution of £j := Xj+i/Wj+i has marginal law drfj for j = 1,2, . . .. For every k, given 
ii, . . . , ik, the vector (X2/W2, ■ ■ ■ , X k /W k ) is conditionally independent of W k and 



Hk(dxi x • • • x dx k \i k = k) = ( J\jj(d€j) ] (k(dw k ) 

where is the conditional law of Wk given i k = k. 
For X to be ISBP, corollary [2] implies that the product 

k-l 

3=1 

must be a symmetric function of xi, ■ ■ ■ , x k . Then, for every k, the vector (Xi/Wk, ■ ■ • , X k /W k ) is both 
NTL and NTR, which implies in particular that X k /W k is independent of W^~_ X {X\, . . . ,X k -i). There- 
fore, by Lemma |3] and symmetry, (^r, ■ ■ ■ , pp 5 -) is, conditionally on W k and {i k = k}, a symmetric 
Dirichlet distribution, with parameter, say 1 — a > 0. By (|4T)|). the EPPF corresponding to d/i is equal 
to 



E 



By the NTL assumption, we can write 



rTlj — 1 

k 4 

3=1 3=2 



n*r'i**=* 

4=1 



n,' — 1 



IT, 



n—k 
k > 



fc 



= MIC ' 1 ' u, 



n—k 
k ' 



vJ=2 



where Sj = 5Zi=i n i C? = 1> • • • > The last equality is due to 



(41) 



(42) 



k—l 



3 = 1 3 = 1 



W j+ i 



Now, set 



V n .k — 



\(i k = k)E(W£- k \i k = k) 
[k(l - a)](„- fe ) 
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Equality (|4"2")> implies 



E 



II A! Ml W, ; 



which completes the proof. 



\(ik = k) 



w% k Ck(w k )dw k 



i [(j + l)(l-a)](s 3+1 -0+i)) 



JJ(1 - ")(n 3 -l), 

J'=l 



4 Age-ordered frequencies conditional on a single record index. 

A representation for the Mellin transform of the m-th age-ordered cumulative frequencies W m , conditional 
on i m alone (m = 1,2,...) can be derived by using Proposition [3] We first point out a characterization 
for the moments of W m , stated in the following Lemma. 

Lemma 4 Let X±, X 2 , . . . be the limit age-ordered frequencies generated by a Gibbs partition with param- 
eters a, V . For every m — 1,2,... and nonnegative integer n 

WJW = (<m " um) (n) \ +n > m (43) 

and 

E(X«|i m ) = (l-«) (n) %±^. (44) 

Proof Let 77 be an EGP (a, V) and denote Yj : j = 1, 2, . . . the sequence of indicators {0, 1} such that 
Yj = 1 if j is a record index of II. Then Y\ = 1 and, for every l,m < I, 



\Y l+1 = | = m) = (Z - am)^ 



= i-p(y,+i = i | 5^y< = m). 

z=l 

By proposition [T] and formula (fl~3|) . given the cumulative frequencies W — Wi, W2, ■ . ., 

I I 
AYi+i = Q\ y £ j Y i = m,W) = l- F(Y l+1 = 1 | ^ = to, W) = W m . 

»=1 i=l 

(see also [24], Theorem 6). Obviously this also implies that, conditional on W, the random sequence 
Ki :— (i = 1, 2, . . .) is Markov , so we can write, for every I, m 

¥{Y l+l = Q\K l = m, K l -l = m-e,W)=F(Y l+1 =0\K l = m,W) = W m , e = 0,l- (45) 
Hence 

E(W m |i m ) = E [P(Y im+ i = I K im = m, K im ^ = to - 1, W)] 
= E[P(F im+1 = I Jf <m = m,W)] 

= (i m - am) ^ +1 ' m , (46) 
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which proves the proposition for m = 1. The Markov property of K n and (|45[) also lead, for every to, to 
E(W£|ij-) = E [P(r Jm+ i = . . . = y n = | K im = to, *r im _a = m - 1, W)} 

n 

= JjE [F(Y im+i = I K im+i ^ = to, W)] 



1=1 

(i m - amj(„)- 



where the last equality is obtained as an n-fold iteration of (f4"6"]) . 

The second part of the Lemma (formula ([44]) ) follows from Proposition [3j 

E(X^|i m )=E(C_ 1 |i m )E(W^|z m ) 

(i m - a7nj(„) 
which combined with ([43]) completes the proof. 

Given the coefficients {V^^}, analogous formulas to (f4"3"]l and iJHj) can be obtained to describe the 
conditional Mellin transforms of W m and X m (respectively), in terms of the Mellin transform of the 
size-biased pick X%. 

Proposition 5 Let Xi,X2, . . . be the limit age-ordered frequencies generated by a Gibbs partition with 
parameters a, V . For every m = 1, 2, . . . and 4> > 

E(Wt\i m ) = ( im -am) w ^f^ (47) 

and 

E(x£|i m ) = (l-a) w %^M (48) 

for a sequence of functions (V ni k[-] : K ^ K; fc,n = 1,2,...), uniquely determined by V n ^k = Ki.fc[0], such 
that, for every (f> > 0, 

H,iM = ?^ ; (49) 

K,fcM = (n + ^-afc)K+i,fcM + K+i,fc+i[0], n,fc= 1,2,...; (50) 

K,fc[0+1] = K+i,fcM n,fc=l,2,.... (51) 

Remark 3 To complete the representation given in Proposition^ notice that, for every a, the distribution 
of X\ (the so-called structural distribution) is known for the extreme points of the Gibbs(a, V) family. 
In particular, for a < X\ has a Beta(l — a,9 + a) density (8 > 0), where 9 — m\a\ for some integer to 
when a < (see e.g. [26]). In this case, 

v ui(f>)= 1 f (1 " a) w U 1 

When a > 0, we saw in the introduction that every extreme point in the Gibbs family is a Poisson- 
Kingman (a, s) partition for some s > 0; in this case the density of X\ is 

for an a-stable density f a ([27], (57)), leading to 

V ia [<j>}(s) = as^rG a {<t> - a - 1, s" 1 /") 

where G a is as in ([7]). 

Thus, for every a, the structural distribution of a Gibbs(a, V) partition, which defines Vi^^] in (|49|) . 
can be obtained as mixture of the corresponding extreme structural distributions. 
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Proof Note that, for <\> = 0,1,2,..., the proposition holds by Lemma 2] with T4, jm [</>] = V n +^, m - For 
general </> > observe that, for every m, n G N, 

E(W* +n - m \i m = m) = E(W^|i m - n)E(W^- m |i m = m). (52) 

To see this, consider the random sequences Y n , K n defined in the proof of Lemma [4j By (1451) . 

E(w£ +n - m \i m = m) = E [W+F(K n = m\K m =m,W)\ K m = m] 
= E [W&P(K n =m\W)\ K m = m] 

= E [W* \K n =m,K m = m] E [P(K n =m\W)\K m = m] 
= E [W+ \K n = m]E [W™- m \K m = m) 
= E [W* | i m = n] E [W r l- m | K m = m] 



where the last two equalities follow from (|45|) , the Markov property of K n and the exchangeability of 
the Y's. 

From Lemma SI we can rewrite ([52")) as 



HW m \i m = n)= r= 

[m(l - a)\(n- m ) V n ,„ 



(53) 



Now define 
and 



M m {<t>)=E{Wi\i m = m), 0>O,m=l,2,.... (54) 

t/ ui v m, m M m ((j) + n - m) 

Vn,m <pj = "tti \i 0>O,n,m = l,2, .... (55) 

[m(l - a)J (ll+0 _ m) 

Notice that, with such a definition, Lemma 1 implies that V raim [0] = V n ^ m . Moreover, V ntm [cj> + 1] = 
Vn+i,m[^] so (|5T|) is satisfied; then ([53")) reads 

E(W m \i m ) = (i m ~ am) w %^M, (56) 

that is: (|39] ) .([3? | and are satisfied. 

Now it only remains to prove that such choice of V^ im [</>] obeys the recursion (|50|) for every n, to, (f>. By 
the same arguments leading to (f52|) . 



M m (4>) - M m (0 + 1) = E[W m (l - W m )\K m = m] 

= E [W*(l - P(X m+ i - m\W)) | K ro = ro] 

= E | X m+1 = m + 1, X m = m] E [1 - F(K m+1 = m\W) \ K m = m] 



E [(1 - Uf I im+i = m + 1] E fwl +1 | K ro+1 = m + 1 



m+l, m+l 



(57) 



The last equality is a consequence of Proposition [31 for which, conditional on K m+ \ = m + 1 (which 
is equivalent to {i m +i = m + 1}), W m = (1 — £m)W m +i for £ m (independent of W m +i) having a 
Beta(l — a, m(l — a)) distribution. Therefore |57|) can be rewritten as 



M m (^6)-M„ 



1) = 



[m(l -oQ] W 
[(m + l)(l-a)] w " 



-M 



m+l 



(0) 



V, 



m+l, m+l 
*m . m. 



(58) 



and ([50)) follows from (158p .([5Tl). after some simple algebra, by comparing the definition ([55]) of Vn,fc [</!>], 
and the recursion ([3]) for the ^-coefficients of an EGP(a, V). In particular, shows that the functions 
V^f^] are uniquely determined by V = (V n _k)- 

The equality (|48[) can now be proved in the same way as the moment formula (|44|) . 
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4-1 Convolution structure. 



An alternative representation of — \ogX m can be given in terms of an infinite sum of random variables, 
as done by [13] for Kingman's model (Ewens' partition). Formula (42) in [T3] shows that, conditional on 

ij oo 

-log(X i )=£y i + £ Z m , (59) 

i— 2 m— 

where {Yi} is a sequence of independent exponential random variables such that the i-th component has 
rate i — 1, and {Z m } is a sequence of mutually independent random variables with density continuous 
everywhere in the non-negative reals except one atom at zero: 

fi(w) = — I(tu = 0) + -£-(t - l)e-^ w I(w > 0), (60) 
1+ Pi 1 - pi 

where Pi = £ ([14], p. 173). 

With the help of ([33]) it is possible to show that a similar representation holds true for general Gibbs 
partitions, but that the collection of the K's is split in two subsequences. It turns out that the inde- 
pendence property of the Z's actually characterizes Kingman's model: for general Gibbs partitions with 
parameters a, V, the Z's are stochastically linked by the sequence K — (K n : n = 1, 2, . . .) as defined in 
the proof of Proposition O Remember that if is a Markov process on N such that K\ — 1 and for j > 1 



p kj = P{K 3 = k + l\Kj- X = fc) = 1 - P(Kj = k\Kj-i = k) 

_ Vj,k+i 



(61) 



Proposition 6 Let X = Xi,X 2 , . . . be the limit relative frequencies generated by a Gibbs partition with 
parameters a,V . Conditionally on i rn , 

m i m oo 

-io g x m = j2y;+ E y / m) + E z < ( 62 ) 

j=2 i=m+l l=i m + l 

where: {Y*} are independent random variables, with distribution, respectively, 



e 



-v(j-i-a(j-i))n _ e -vy a 



= B (, -l-a(, -D,i-a) Ky > ° )dy] (63) 
{Y^ m ^}'s are independent random variables, with distribution, respectively, 

-j(t-l-ora) 

f a ' m) (y)dy = -I(y > 0)dy, (64) 

i — am — 1 

and {Zi} are such that such that: (i) Z[ is conditionally independent of Z\_\, given (ii) conditionally 

on = k,Ki~ v}, the density of Zi is 

9?kv( z ) dz = $kv$Qzdz + (1 - 5 kv )—— — r— I(z > 0)dz. (65) 

' ' B(l — 1 — ak, 1 — a) 

Proof From Proposition ([3]), and the properties of Beta distributions, 



E(e ( 



' log X„ 



o=E(«*_ 1 n(i-e,)*)ii TO ) 



(i 



( 1_Q O(0) w ( -Q (h+i - ak - l) w 



am 



(i k +i - a(k + 1)) ( 0) 



(66) 
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The fraction on the left-hand side can be decomposed as: 



(i m - am)(0) 



n 

a=2 



(j - 1 - a(J - 1)) (0 ) 



n 



(i - 1 - am)^) 



=m+i (* ~ am )(« 



n^;') n E (^) > 



(67) 



\i— m+l 



where, for each j, 
and for each i, 



Y* ~ Beta(j - 1 - a(j - 1), 1 - a), 



r/ m) ~ Beta(j - 1 - am, 1). 



Now simply set F/ := - \ogY* and y/ m) := - logY/ ro; to see that 



r{m) 



(1-a) 



(</>) 



(i m - am) 



= ]E(e -0E™ 2 >7 + E- 1+1 ^ (m, ] )j 



which provides the first two sums in the representation (|62[) . 
Define \i = Ki — K\-\ where Ki is described by (|61[) . Then 



\k—m 



[ik+x ~ ak ~ 1)(0) 
(ifc+i - a(fe + 1)) ( 0) 



n 



(/ - aK, - (1 - a)) 0Xl 



\i=i m +i 

f oo 



(/-a(fei))0 Xl y 
( n E^'l^i.^-Olim J , 

\l=i m +l ) 

where each Z l is a Beta random variable with parameters (I — aKi — (1 — a), 1 — a). Now define 

Zi = -xilogZi. 

Then Zi has the required distribution (|65| and the last expectation can be written as 



E J] E(e-* z '|K,,/r z _i)|i m ) =E(e-*£E <m+ i*i|i 
\j=i m +i / 

which completes the proof. 

The characterization of Kingman's model follows immediately. 

Corollary 3 Let X = Xi,X2, . . . be the limit relative frequencies generated by a Gibbs partition with 
parameters a, V . The random variables involved in the three sums in are all mutually independent 
if and only if a = and V is such that, for every n,k < n, V n ,k = # fc /^0) f 0T some 9 > 0. 

Proof We only have to verify the independence of the Z's. In general, Z\ depends on Zi_\ only through 
Ki-i (I > i m ). By averaging over Ki, one has that the conditional density of Wi given = k} is: 



But remember that 



(l-Pki)6 0z (dz)+ Pk i \ > 0)dz. 

B(l — 1 — ak, 1 — a) 



(68) 



= (I -ak - 1) 



does not depend on k only if a = and 

Vj,fc _ C;_i 



k,l = 1,2,.. 
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for some constant q. This implies that the ^-coefficients are of the form 

V n>k = c- 1 V^ 

but, as mentioned in the introduction, within the family of all EGPs, only Pitman's two-parameter 
partitions have such a form. Since we just saw that a is necessarily zero, then it must be 

Qk 

Vn,k = 2 

for some 8 > 0, which corresponds exactly to Kingman's model. In this case, the density (|6"8)) reduces to 

I - 1 9 e - z ( 1 - 1 ) 

S 0z (dz) + 1 — a -- — . (69) 



1+0-1 v '1+6-1 l-l 
The converse is straightforward. 

Remark 4 Notice that, when a = 0, the Yf's and the Y^" 1 -* are all exponential random variables, each 
with parameter (j — 1), respectively. This, together with (|6"9"|) . leads us back to Griffiths and Lessard's 
representation (|59")) - (f6T))) . 



5 A representation for normalized age-ordered frequencies in an exchangeable Gibbs 
partition. 

In this section we provide a characterization of the density of the first k (normalized) age-ordered frequen- 
cies, given ifc and Wk, and an explicit formula for the marginal distribution of ik- We give a direct proof, 
obtained by comparison of the unconditional distribution of X\, . . . , Xk, (k = 1, 2, . . .), in a general Gibbs 
partition, with its analogue in Pitman's two-parameter model. Such a comparison is naturally induced 
by proposition [3l which says that, conditional on the record indices, the distribution of the age-ordered 
frequencies is the same for every Gibbs partition. Remember that the limit (unconditional) age-ordered 
frequencies in such a family are described by the two-parameter GEM distribution, for which 

j'-i 

Xj^BjUil-Bi) 

z=l 

for a sequence B\, B2, ■ ■ . of independent Beta random variables with parameters, respectively, (1 — a, 9 + 
ja) (see e.g. [H]). 

Proposition 7 Let X\, X2, ■ ■ ■ be the age-ordered frequencies of a EGP(a, V) and, for every k let Wk = 
(i) Conditional on Wk = w and on ik — k + i, the law of the vector X\, . . . , Xk is 

dfi Q ,v(xi, ■ ■ ■ ,x k \w,k+i) = Vk+hk — w -( k -V \^ p, a y(m)T> (m _ a) ( — , . . . , — )dxj, . . . ,dx k -i 

Vk+i-i,k-i „ k , , . , w w 

(70) 

where: T>r m _ a \ is the Dirichlet density with parameters (mi — a, ... , rrik-\ — a, 1 — a), and /i Q .y(m) is 
the age-ordered sampling formula H15\) with Gibbs' EPPF q a .v as in A13\) : 

\IL=i (™j)K k + 1 - 1 - Ei=i m) J 



(ii) The marginal distribution of ik is 



a -(k-i) /_ 1 y+fc + i_i 
lk = k + l ) = W (fc+ ._ 1} , L Jy— fa Wi] > (71) 



j'=o 

where or n i = a(a — 1) • • • (a — n + 1). 
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Remark 5 The density of i k can be expressed in terms of generalized Stirling numbers [£] , defined as 
the coefficients of x n in 

' {i-(i-xY) k 



( [H]j[I3])- Formula (jTTj) can be re-expressed as: 

Hh) = v ik!k 



ik - 1 
fc - 1 



which makes clear the connection between the distribution of ik and the distribution of K n recalled in 
the introduction (formula ((8])). In fact, ([7T|) can be deduced simply from ([8|) by the Markov property of 
the sequence K n as 

p(* fc ) = = fc i K ik -! = k-i) nK lk -i = k - 1) 



■Vi 



i fc _i,fc-l 



4 - 1 

k - 1 



However here we give a self-contained proof in order to show how (|71j) is implied by proposition[3]through 

cm}. 



Proof From (flQ|) . we know that 



P(i 2 , ■ • • ,ik) = V iktk 



r(i 2 -a-l)---r(i k -a(k-l)-l) 
r(i - a)r(i 2 - 2a) ■ ■ • r(i fc _i - (k - l)a) '' 



By proposition [3] and Lemma 31 



fc-i 



3=1 



v3=l 



i—k 



ll,...,l k 



fc-1 



n (1 7 ) '?"'^'X' 1) ' s ^ (nc-^fe) 



3=1 



fc-1 

n 



fe+i - (j + l)«)(s J+I ) 
(1 - a)( ni+1 )(ij+i - J'a - l)(s 3 .) \ r(i fc +n - ak) V n+ih>k 



(ij+i - U + 1 )«)(s i+1 ) 



r{i k -ak) V lk:k 



(72) 



hence 



E ( J]X^|l = u,i 2 ,...,i fe )P(i 2 ,...,4) 

3=1 

fe-1 



(ik - ak) (n) V n+ikt k Y[ 



3 = 1 



(1 - 0)^7^+1 - ja - 1 + Sj)r(i j+1 - a{j + f)) 
r(i j+1 - a(j + 1) + S , J+ i) J r(i j - aj) 



lie 1 -°)(nj) 

3=1 



fe-1 



3=1 



(73) 



where the last equality follows after multiplying and dividing all terms by (1 — a) ni . Consequently, a 
moment formula for general Gibbs partitions is of the form: 



E 



(a,V) 



vi=l / 3=1 Ki2 <-..«» 



(74) 



where, for 1 < j ' < k — 1, 
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For fixed ik denote 

^ik = C {hM) ' " c (ik-i,ik)' 
Ki2<---<ik 

Then (JH)) reads 

(k \ k OO 

n x ? = ri( i e a *^w- (75) 

j=l y j=l i=k 

For Pitman's two-parameter family, this becomes 

f (firA TTn ^ V> n; = i(g + "(3-i)) 

E («i) Ml x i = lit 1 -«)(n,) 2^ A « z; — : ■ ( 76 ) 

\3=1 / 3=1 

The two-parameter GEM distribution implies that 



n — S 7 - 



E (n*"'J =E^ii fl j , ( 1 -^) 

= A (1 - Qpn, j«)(„-5 J ) 

/=! (« + l + 0"-l)«)(n-S 4 _x) 

1 JL (l-a) M (9 + a(3-l)) 
(%)ii(9 + a(?-l) + »-Si-i)' 

therefore, from (|76|) we derive the identity: 

Jl i{ 9 + a(j-l) + n-S^)=^ X ^ 9> °- (78) 

For > replace 9 by — n in (l78|) . and denote n* = 1 — a + rij. We now find an expansion of the 

left-hand side of {75]) in terms of products of the type J^i ( n jO(m 3 )i f° r m i> ■ ■ ■ i m k > 0. The left-hand 
side of ([78)1 is now 

k , fc 



11 ffl _ <n 4- rvf-i _ 1 ^ 4- -n - .<?* ^ II 



L (0 - n + a(j - 1) + n - S;_ x ) ^ (0 + j - 1 + ^_ x ) 
fc /.l 

3=1 J ° 

fc-i \ ( fc-i 

n *j J f* 1 • • • **-i)~ nI (*a • • • ife-i)"" 3 • • • cr 1 n *3 dt ° • • • dtk -!> ( 79 ) 

where Sq = and S** = Y^i=i n *j-> J = l,---,fc — 1- Make the change of variable 

fc-i 

uj = 1 - H*i, j = 0, ...,k- 1. 

«=3 

Then < Uk-x < ... <uq<\. The absolute value of the Jacobian is 

(tl • • -tfc-l) x (t 2 • • ■ *Jfc— i) x tfe-l x 1 
fc-i 

3=1 



(Zft 
rf 7 
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Thus (|75|) transforms to 



. fe-i 

/ (1 - uo) 9 - 1 u)- n U Ul ■ ■ • dufc_iduo. 

J0<«fc-i<...<u o <l »_i 



J'=l 

Fix uo and consider 

fc-i 



f JJ(l-tt) m *d«i---du fc _i 

-'0<Mfc_l<...<!iO j—l 



mj_i + 1 TOj_i + mj_ 2 + 1 mj_i + . . . + mi + 1 

The integral in (|79l) is thus 

S n^f" i n /V - «o> 8 -'«r i+EK 

rai,.,mn>0 1 li=l V 1 j=l J0 

where 

fe— 1 k— 1 /i \ 

Now consider the right-hand side of ([75)1 , again with 6> replaced by (9 — n: 

and compare it with ([80]) to obtain a representation for the Aj's in (|75|) : 

A fc+» v c ( m ) tt „* 

(fc + i-l)! ^ (l-a) fm0 11 3 W 

V ; meN*:|m|=i V ;lm,) j=l 

where No = N U {0}. Recall that n* = 1 — a + rij and consider the identity 

(l-q) (nj) n; (mj0 
(1 - «)(m 3 ) 

From ((33]) we also know that 



(1 -a + mj)^). 



v n+k+i , k = (hJ _ Vk+l '\, nwz\i k = k+i). 

(k + i- a*)(n) 



Thus (J75J) and (|82j) imply that 

fe 



ij=l / i=0 



meNj _1 :|m|=i 



{k + i- ak) {n) 

nifci^Q : | in | — 'i •- 

Now, for every m £ Nq -1 such that |m| = i, define m'j = + 1; then we can rewrite 

V k+i>k ( {k + i-l)\ x 



{k + i- l)\c{m)V k+i , k = VW ' fc ; ^ 1; - ■ i V fc +i-i,fe-i n (i 



— Ma,n m ). 

Kfc-H-l.fc-l 
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Thus the right-hand side of ([53]) becomes 
Vk+i,k 



£ 



E Ma,v(m') 



4 =0 1,k 1 m'eN fc - 1 :|m'|=fc+i-l 



(1 - a) (nfc) ]lj=i( TO / - oQk) 
(fc + i — a/c)(„) 



The term between square brackets is the ni,...,nfe-th moment of A: [0, l]-valued random variables 
Yi(m'), . . . , r fc (m') such that 

k 

E r *( m ') = {WkW = k + i) 

i=l 

and, conditional on Ei=i li(m'), the distribution of 



Eti^(m')"'"Eti^(m') 

is a Dirichlet distribution with parameters (mi' — a, ... , rrik-\ — a, 1 — a), therefore (|84p completes the 
proof of part (i). 



To prove part (ii), we only have to notice that, from (j54"|) it must follow that 

i=0 A ' K m'ef! k - 1 :\m>\=k+i~l 

hence a version of the marginal probability of ik is, for every k 

Vk+i,k 



(i k = k + i) 



Vk+i-i,k-i 



E n a y{va'). 



(85) 



' GN^- 1 :|m' '\=k+i-l 

This can be also argued directly, simply by noting that, in an EGP(a, V), 

^ a) v(m') = P(ifc_i < k + i - 1, i k > k + i - 1) 

m'GN' e - 1 :|m'|=fc+i-l 

and that 



= P(i fc = fc + *|* fe _x < fc + i - 1, «fe > fc + i - 1). 



V/s+i-l,fe-l 

We want to find an expression for the inner sum of (|85|l . If we reconsider the term c(m) as in (|8ip (for 
m 6 Nq _1 : |m| = i), we see that 

£ c(m) 

m€Nj _1 :|m|=i 

is the coefficient of C in 



1 



(fc-1)! 



(1 - uO"- 1 * 



fc-i 



fe_1 C-1 



Thus 



£ c ( m ) = 



v-(fe-l) 



fc-1 

£ 



(-if 



fc+i+j-l 



(ja)[fc+i-i]- 



(86) 



Since 



E Ma,v( m ') = + ?: - !) ! V^+i-i.fc-i E c ( m ) 

m'eN fe - 1 :|m'|=fc+i-l 

then part (ii) is proved by comparison of ([86]) with (|85|) . 
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